Fullstack Development of a Scientific Information System
The project concerned the development, deployment, and installation of the OpenSilex Scientific Information System. It took place between November 2021 and July 2022, in Montpellier.
Several challenges emerged throughout the different missions, all related to OpenSilex. The first consisted of adding functionality allowing the tracking of scientific object trajectories, associated with environmental data. Another objective was the development of an R Shiny package to facilitate connection to OpenSilex via applications, as well as the migration of lake data (Answer project) to the most recent version of OpenSilex. Finally, it involved installing and configuring a new OpenSilex instance for managing experimental data on vines and wine, within the framework of the Sinfonia project.
The involved parties included UMR MISTEA for OpenSilex-related developments, IFV for open-source development and Sinfonia instance management, and the Answer project for lake data migration.
Tasks & Objectives
My main role in this project was that of developer, with specific missions of web semantic consulting, particularly for data modeling in the Sinfonia project.
The objectives varied according to the missions:
- enabling the association of meteorological data with the movement of a scientific object
- ensuring smooth connection with the server via a performant R client in a reusable Shiny library
- migrating the OpenSilex version for the Answer project to the newest one
- configuring and deploying the Sinfonia OpenSilex instance.
The success of each task was measured by the proper functioning of the system or specific functionalities, within the framework of the defined mission.
Actions and Development
Concretely, I added new functionalities to the OpenSilex software, built a generic library for Shiny applications, performed custom ETLs to migrate data between two major versions of OpenSilex, and installed and configured a new system instance.
These actions were part of research and development projects, in close collaboration with scientific and technical managers. The use of ontologies, often complex to manipulate, was central to this information system.
I worked with UMR MISTEA, IFV, agronomists, agricultural engineers, and the OpenSilex development team.
Among the challenges encountered, the code, still in prototyping phase, was evolving rapidly, complicating manual migrations and modeling due to the specificities of each scientific domain. The lack of automation and the specificities of API clients also complicated certain tasks.
Technical decisions were made collectively with the development team and steering committee, taking into account user needs and scientific constraints.
Results
The results of this project include the installation and configuration of a new OpenSilex instance for managing experimental data on vines and wine (Sinfonia), successful migration of data to the most recent version of the software, as well as the creation of an R package leading to the construction of a Shiny application allowing more specific visualization of data.
Among the lessons learned, I observed the crucial importance of code generation tools and respect for specifications to ensure the proper functioning of an information system. Manual data migration, given the specificities of different scientific domains, requires particular rigor. The rapid evolution of the base model, sometimes without backward compatibility, can harm the user, hence the importance of respecting semantic versioning.
Finally, solid technical skills in building API clients, particularly in R/Shiny, are essential.
Technical Stack
The project relies on the following tools and technologies:
- Migration Scripts : Python
- OpenSilex Client : R/Shiny
- Environmental Data Service : Java
- Semantic Queries : SPARQL
It is important to note that this technical stack is inherited: I did not participate in the initial choices. The major technical challenges encountered include:
- Data migration for the Answer project: The model having evolved considerably between versions, and the data being specific to mathematical simulations, many adjustments were necessary to adapt them to the new model. The lack of documentation also complicated this process.
- Development of the environmental data retrieval service: Complexity of the physical data model and software architecture problems, making the task more difficult than it should have been.